MoloVol: an easy-to-use program for analyzing cavities, volumes and surface areas of chemical structures

MoloVol is a free program for calculating volumes and surface areas of molecules and their cavities.


Flow charts of algorithms
The general process of a MoloVol calculation is shown in Fig. 5 of the manuscript. To provide more insight into the details of the calculations, three flow charts are presented in Fig. S1-S3. These cover integral algorithms that are used during the main volume and surface calculation. These algorithms are described in more detail in the manuscript. Figure S1. Flow chart depicting the algorithm that differentiates between atom and probe core type voxels. After the voxel-atom distance is evaluated in (1) and none of the comparisons have turned out true, the voxel is considered "far away from every atom" and assigned probe core type. This algorithm is described in section 4.2.1 of the manuscript. Figure S2. Flow chart depicting the algorithm that differentiates between probe excluded void and probe shell type voxels. Prior to the search for a neighboring core type voxel, the voxel is assigned a default type. The voxel will retain this type if no core voxel is found. The default type depends on whether this algorithm is conducted with the large probe (as part of two-probe mode) or with the small probe. This algorithm is described in section 4.2.2 of the manuscript.

Analytical volume and surface area values for acetylene
The acetylene structure analyzed herein is a linear arrangement of four hard spheres. The Vvdw, Vacc, Svdw, and Sacc values can be readily calculated using formulas for volume and surface areas of spheres and spherical caps. Vmol and Sexc can also be calculated analytically; however, the calculation is slightly more difficult. The geometric solid that needs to be considered for Vmol and Sexc alternates between atom-defined and probe-defined sections along the axis of binding on which the atoms are arranged. A 2D slice of this solid is shown in Fig. S4. The atom-defined sections can be readily calculated using the formulas for spherical caps. Figure S4. Slice of an acetylene molecule. A rolling probe defines a probe excluded surface and a molecular volume. Surface areas and volumes can be calculated with formulas for spherical caps for blue sections (atom-defined), and with formulas for solids of revolution for orange sections (probe-defined).
To calculate the remaining probe-defined sections, it is necessary to derive a function that gives the radius of the solid for a position along the axis of rotation (equivalent to the axis of binding, because the molecule is linear). With this function ( ), the volume and surface area of the section can be calculated using formulas for solids of revolution.
The radius function can be obtained entirely by using trigonometry on the triangle spanned by the centers of a pair of atoms and a probe (compare Fig. S4). For the following equations the xaxis is parallel to the axis of rotation. The origin is placed where the axis of rotation intersects with the line perpendicular to the axis of rotation passing through the probe center. Therefore, the xcoordinate of the probe is 0. With this reference system the radius function is: Here, is the probe radius and is the shortest distance of the probe from the axis of rotation, which depends on the side lengths of the triangle, i.e., the atom-probe distances and the atom-atom distance. Only the integration limits need to be determined. They are, again, fully determined by the aforementioned triangle and therefore easily calculated: 1 = cos( − ) (4) 2 = cos (5) Here, and are the angles of the triangle's vertices centered at atoms and respectively. They can be calculated using the law of cosine.
To calculate the volume and surface area, equation (3) is integrated in equations (1) and (2) respectively using the limits from equations (4) and (5). The calculations need to be conducted for the hydrogen-carbon and the carbon-carbon pair.
The geometrical features of the other structures are too complex to easily calculate volumes and surface areas analytically.

Hardware specifications
Computations were performed on a 13-inch MacBook Pro from Apple, produced in 2020 (uniquely identified as MacBookPro17,1). The central processing unit (CPU) is an ARM-based Apple M1 and contains an in-built graphics processing unit. The computer provides 8 GB of random-access memory (RAM).
Computation results were verified on a second computer equipped with an Intel Core i7-9750H CPU (2.60 GHz) and 16 GB of RAM.
Runtime measurements were conducted using the in-built time-command within Apple's bash shell. For the total runtime, the "user" and "sys" values were added. The result represents the total CPU time and is not affected by idling.
For all programs using voxelated space (i.e., MoloVol, 3V, Platon, and PoreBlazer), the grid resolution (sometimes called grid size or grid step) was set to 0.2 Å. The probe radius was set to 1.20 Å.
Platon, Zeo++ and PoreBlazer can only analyze crystal unit cells. Therefore, isolated molecules such as acetylene were placed in a dummy orthogonal unit cell large enough to let the probe roll freely around the molecule. The total unit cell volume used in some calculations is named Vunit-cell.
A full calculation of MoloVol v1.0.0 was conducted with the following parameters: Include HETATM: no (when applicable) Analyze crystal unit cell: yes for HKUST-1, no for other structures Analyze surface area: yes Enable two-probe mode: no Optimization depth: 4 Vvdw, Vmol, Svdw, Sexc, Sacc, and Socc are all given directly in the report file.
Vacc is calculated as Vvdw + Vvoid + Vshell (all given in the report file).
Element radii were set using the following command (example for carbon) alter (elem C),vdw=1.77 rebuild The following settings were used: VDW.exe -i input.xyzr -g 0.2 (with element radii increased by 1.2 Å in the input.xyzr file)
The element radii were set with the command SET VDWR.
Vvdw was calculated from Vunit-cell -(Vvoid + Vcore + Vshell) The total empty space (Vvoid + Vcore + Vshell) was obtained with the command CALC SOLV GRID 0.2 PROBE 0 Vmol was calculated from Vunit-cell -Vocc The probe occupied space Vocc was obtained with the command CALC SOLV GRID 0.2 PROBE 1.2 Note that a large error was obtained for Vvdw. Thus, it is possible that the program is not designed to process a probe with zero radius.
Loading custom element radii from a .rad file failed. Thus, we hardcoded the element radii in the source code file networkinfo.cc prior to compilation.
Calculations with Zeo++ must specify a sampling number. For optimal results we used a value of 100000 for all calculations. For optimal runtime, we used 100000 only for volume calculations and 2000 for surface calculations.
The probe diameter (2.4 Å) and grid resolution were set in the file default.dat as Nitrogen atom sigma and Cubelet size, respectively. Other parameters from default.dat were kept as provided.
The structures were provided in the XYZ format and the unit cell parameters were set in the file input.dat.
The calculation produced the following values in results.txt:

Computational results
Volumes and surface areas calculated for five structures including small molecules, an open cage compound, a protein, and a porous material (metal-organic framework) with six different programs are reported in Table S1. The metal-organic framework was not analyzed with PyMOL and 3V because these programs are unsuitable to analyze unit cells of crystal structures.   To calculate Sacc, the atomic radii were increased by the probe radius (1.2 Å), then Svdw was computed using the VDW executable. The results shown here were obtained without OpenMP. Figure S8. Breakdown of runtime contributions for Platon calculations. To calculate Vvdw, the volume calculation was repeated with a probe radius of 0. However, it appears that this workaround does not produce reliable results (see Table S1).

Input structures
6.1. Acetylene A model of acetylene was used with atomic coordinates given in Table S2. When a unit cell was necessary, the acetylene molecule was placed in a dummy unit cell of size 101010 Å 3 .

Fullerene C60
A model of fullerene C60 was used with atomic coordinates given in Table S3. When a unit cell was necessary, the C60 molecule was placed in a dummy unit cell of size 131313 Å 3 . The original cif file (CCDC 1893553) was converted to xyz format with the program Mercury 2021.3 to contain a single cage. When a unit cell was necessary, the cage complex was placed in a dummy unit cell of size 30×30×30 Å 3 .

Cytochrome C complex
The crystal structure of the cytochrome C complex was downloaded from the protein data bank (PDB 6S8Y) in the pdb format. HETATM were included for calculations. The file was converted to xyz format with the program Mercury 2021.3. When a unit cell was needed, the protein was placed in a dummy unit cell of size 38×68×45 Å3.

Trypsin protein
The crystal structure of trypsin was downloaded from the protein data bank (PDB 1C2H) in the pdb format. HETATM lines were removed, and the file was converted to xyz format with the program Mercury 2021.3. When a unit cell was needed, the protein was placed in a dummy unit cell of size 60×50×60 Å 3 .
6.6. HKUST-1 The crystal structure of this porous metal-organic framework can be accessed from the CCDC (number 755080). The same structure is provided as an example with PoreBlazer on GitHub repository: https://github.com/SarkisovGroup/PoreBlazer/tree/main/Windows/HKUST1